home *** CD-ROM | disk | FTP | other *** search
Text File | 1992-02-29 | 39.5 KB | 1,209 lines |
- Newsgroups: comp.sources.unix
- From: mjd@saul.cis.upenn.edu (Mark-Jason Dominus)
- Subject: v25i136: classify - compare groups of files and classify them
- Sender: unix-sources-moderator@pa.dec.com
- Approved: vixie@pa.dec.com
-
- Submitted-By: mjd@saul.cis.upenn.edu (Mark-Jason Dominus)
- Posting-Number: Volume 25, Issue 136
- Archive-Name: classify
-
- #! /bin/sh
- # This is a shell archive. Remove anything before this line, then unpack
- # it by saving it into a file and typing "sh file". To overwrite existing
- # files, type "sh file -c". You can also feed this as standard input via
- # unshar, or by typing "sh <file", e.g.. If this archive is complete, you
- # will see the following message at the end:
- # "End of shell archive."
- # Contents: COPYING Makefile README classify.1 classify.c test0 test1
- # test2 test3 test4
- # Wrapped by vixie@cognition.pa.dec.com on Sat Feb 29 20:19:14 1992
- PATH=/bin:/usr/bin:/usr/ucb ; export PATH
- if test -f 'COPYING' -a "${1}" != "-c" ; then
- echo shar: Will not clobber existing file \"'COPYING'\"
- else
- echo shar: Extracting \"'COPYING'\" \(12488 characters\)
- sed "s/^X//" >'COPYING' <<'END_OF_FILE'
- X
- X GNU GENERAL PUBLIC LICENSE
- X Version 1, February 1989
- X
- X Copyright (C) 1989 Free Software Foundation, Inc.
- X 675 Mass Ave, Cambridge, MA 02139, USA
- X Everyone is permitted to copy and distribute verbatim copies
- X of this license document, but changing it is not allowed.
- X
- X Preamble
- X
- X The license agreements of most software companies try to keep users
- at the mercy of those companies. By contrast, our General Public
- License is intended to guarantee your freedom to share and change free
- software--to make sure the software is free for all its users. The
- General Public License applies to the Free Software Foundation's
- software and to any other program whose authors commit to using it.
- You can use it for your programs, too.
- X
- X When we speak of free software, we are referring to freedom, not
- price. Specifically, the General Public License is designed to make
- sure that you have the freedom to give away or sell copies of free
- software, that you receive source code or can get it if you want it,
- that you can change the software or use pieces of it in new free
- programs; and that you know you can do these things.
- X
- X To protect your rights, we need to make restrictions that forbid
- anyone to deny you these rights or to ask you to surrender the rights.
- These restrictions translate to certain responsibilities for you if you
- distribute copies of the software, or if you modify it.
- X
- X For example, if you distribute copies of a such a program, whether
- gratis or for a fee, you must give the recipients all the rights that
- you have. You must make sure that they, too, receive or can get the
- source code. And you must tell them their rights.
- X
- X We protect your rights with two steps: (1) copyright the software, and
- X(2) offer you this license which gives you legal permission to copy,
- distribute and/or modify the software.
- X
- X Also, for each author's protection and ours, we want to make certain
- that everyone understands that there is no warranty for this free
- software. If the software is modified by someone else and passed on, we
- want its recipients to know that what they have is not the original, so
- that any problems introduced by others will not reflect on the original
- authors' reputations.
- X
- X The precise terms and conditions for copying, distribution and
- modification follow.
- X
- X GNU GENERAL PUBLIC LICENSE
- X TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
- X
- X 0. This License Agreement applies to any program or other work which
- contains a notice placed by the copyright holder saying it may be
- distributed under the terms of this General Public License. The
- X"Program", below, refers to any such program or work, and a "work based
- on the Program" means either the Program or any work containing the
- Program or a portion of it, either verbatim or with modifications. Each
- licensee is addressed as "you".
- X
- X 1. You may copy and distribute verbatim copies of the Program's source
- code as you receive it, in any medium, provided that you conspicuously and
- appropriately publish on each copy an appropriate copyright notice and
- disclaimer of warranty; keep intact all the notices that refer to this
- General Public License and to the absence of any warranty; and give any
- other recipients of the Program a copy of this General Public License
- along with the Program. You may charge a fee for the physical act of
- transferring a copy.
- X
- X 2. You may modify your copy or copies of the Program or any portion of
- it, and copy and distribute such modifications under the terms of Paragraph
- X1 above, provided that you also do the following:
- X
- X a) cause the modified files to carry prominent notices stating that
- X you changed the files and the date of any change; and
- X
- X b) cause the whole of any work that you distribute or publish, that
- X in whole or in part contains the Program or any part thereof, either
- X with or without modifications, to be licensed at no charge to all
- X third parties under the terms of this General Public License (except
- X that you may choose to grant warranty protection to some or all
- X third parties, at your option).
- X
- X c) If the modified program normally reads commands interactively when
- X run, you must cause it, when started running for such interactive use
- X in the simplest and most usual way, to print or display an
- X announcement including an appropriate copyright notice and a notice
- X that there is no warranty (or else, saying that you provide a
- X warranty) and that users may redistribute the program under these
- X conditions, and telling the user how to view a copy of this General
- X Public License.
- X
- X d) You may charge a fee for the physical act of transferring a
- X copy, and you may at your option offer warranty protection in
- X exchange for a fee.
- X
- Mere aggregation of another independent work with the Program (or its
- derivative) on a volume of a storage or distribution medium does not bring
- the other work under the scope of these terms.
- X
- X 3. You may copy and distribute the Program (or a portion or derivative of
- it, under Paragraph 2) in object code or executable form under the terms of
- Paragraphs 1 and 2 above provided that you also do one of the following:
- X
- X a) accompany it with the complete corresponding machine-readable
- X source code, which must be distributed under the terms of
- X Paragraphs 1 and 2 above; or,
- X
- X b) accompany it with a written offer, valid for at least three
- X years, to give any third party free (except for a nominal charge
- X for the cost of distribution) a complete machine-readable copy of the
- X corresponding source code, to be distributed under the terms of
- X Paragraphs 1 and 2 above; or,
- X
- X c) accompany it with the information you received as to where the
- X corresponding source code may be obtained. (This alternative is
- X allowed only for noncommercial distribution and only if you
- X received the program in object code or executable form alone.)
- X
- Source code for a work means the preferred form of the work for making
- modifications to it. For an executable file, complete source code means
- all the source code for all modules it contains; but, as a special
- exception, it need not include source code for modules which are standard
- libraries that accompany the operating system on which the executable
- file runs, or for standard header files or definitions files that
- accompany that operating system.
- X
- X 4. You may not copy, modify, sublicense, distribute or transfer the
- Program except as expressly provided under this General Public License.
- Any attempt otherwise to copy, modify, sublicense, distribute or transfer
- the Program is void, and will automatically terminate your rights to use
- the Program under this License. However, parties who have received
- copies, or rights to use copies, from you under this General Public
- License will not have their licenses terminated so long as such parties
- remain in full compliance.
- X
- X 5. By copying, distributing or modifying the Program (or any work based
- on the Program) you indicate your acceptance of this license to do so,
- and all its terms and conditions.
- X
- X 6. Each time you redistribute the Program (or any work based on the
- Program), the recipient automatically receives a license from the original
- licensor to copy, distribute or modify the Program subject to these
- terms and conditions. You may not impose any further restrictions on the
- recipients' exercise of the rights granted herein.
- X
- X 7. The Free Software Foundation may publish revised and/or new versions
- of the General Public License from time to time. Such new versions will
- be similar in spirit to the present version, but may differ in detail to
- address new problems or concerns.
- X
- XEach version is given a distinguishing version number. If the Program
- specifies a version number of the license which applies to it and "any
- later version", you have the option of following the terms and conditions
- either of that version or of any later version published by the Free
- Software Foundation. If the Program does not specify a version number of
- the license, you may choose any version ever published by the Free Software
- XFoundation.
- X
- X 8. If you wish to incorporate parts of the Program into other free
- programs whose distribution conditions are different, write to the author
- to ask for permission. For software which is copyrighted by the Free
- Software Foundation, write to the Free Software Foundation; we sometimes
- make exceptions for this. Our decision will be guided by the two goals
- of preserving the free status of all derivatives of our free software and
- of promoting the sharing and reuse of software generally.
- X
- X NO WARRANTY
- X
- X 9. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
- XFOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN
- OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
- PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
- OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
- MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS
- TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE
- PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING,
- REPAIR OR CORRECTION.
- X
- X 10. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
- WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
- REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES,
- INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING
- OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED
- TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
- YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
- PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
- POSSIBILITY OF SUCH DAMAGES.
- X
- X END OF TERMS AND CONDITIONS
- X
- X Appendix: How to Apply These Terms to Your New Programs
- X
- X If you develop a new program, and you want it to be of the greatest
- possible use to humanity, the best way to achieve this is to make it
- free software which everyone can redistribute and change under these
- terms.
- X
- X To do so, attach the following notices to the program. It is safest to
- attach them to the start of each source file to most effectively convey
- the exclusion of warranty; and each file should have at least the
- X"copyright" line and a pointer to where the full notice is found.
- X
- X <one line to give the program's name and a brief idea of what it does.>
- X Copyright (C) 19yy <name of author>
- X
- X This program is free software; you can redistribute it and/or modify
- X it under the terms of the GNU General Public License as published by
- X the Free Software Foundation; either version 1, or (at your option)
- X any later version.
- X
- X This program is distributed in the hope that it will be useful,
- X but WITHOUT ANY WARRANTY; without even the implied warranty of
- X MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- X GNU General Public License for more details.
- X
- X You should have received a copy of the GNU General Public License
- X along with this program; if not, write to the Free Software
- X Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
- X
- Also add information on how to contact you by electronic and paper mail.
- X
- If the program is interactive, make it output a short notice like this
- when it starts in an interactive mode:
- X
- X Gnomovision version 69, Copyright (C) 19xx name of author
- X Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
- X This is free software, and you are welcome to redistribute it
- X under certain conditions; type `show c' for details.
- X
- The hypothetical commands `show w' and `show c' should show the
- appropriate parts of the General Public License. Of course, the
- commands you use may be called something other than `show w' and `show
- c'; they could even be mouse-clicks or menu items--whatever suits your
- program.
- X
- You should also get your employer (if you work as a programmer) or your
- school, if any, to sign a "copyright disclaimer" for the program, if
- necessary. Here a sample; alter the names:
- X
- X Yoyodyne, Inc., hereby disclaims all copyright interest in the
- X program `Gnomovision' (a program to direct compilers to make passes
- X at assemblers) written by James Hacker.
- X
- X <signature of Ty Coon>, 1 April 1989
- X Ty Coon, President of Vice
- X
- That's all there is to it!
- END_OF_FILE
- if test 12488 -ne `wc -c <'COPYING'`; then
- echo shar: \"'COPYING'\" unpacked with wrong size!
- fi
- # end of 'COPYING'
- fi
- if test -f 'Makefile' -a "${1}" != "-c" ; then
- echo shar: Will not clobber existing file \"'Makefile'\"
- else
- echo shar: Extracting \"'Makefile'\" \(162 characters\)
- sed "s/^X//" >'Makefile' <<'END_OF_FILE'
- X
- CFLAGS= -O
- CC= gcc
- all: classify
- X
- classify: classify.c
- X $(CC) $(CFLAGS) classify.c -o classify
- X
- clean:
- X rm -f classify *.o a.out core *~
- X
- X.SCCS_GET:
- X co -l $*
- X
- END_OF_FILE
- if test 162 -ne `wc -c <'Makefile'`; then
- echo shar: \"'Makefile'\" unpacked with wrong size!
- fi
- # end of 'Makefile'
- fi
- if test -f 'README' -a "${1}" != "-c" ; then
- echo shar: Will not clobber existing file \"'README'\"
- else
- echo shar: Extracting \"'README'\" \(1514 characters\)
- sed "s/^X//" >'README' <<'END_OF_FILE'
- X `classify' is a utility for comparing many files to each
- other all at once. For example, if you manage a collection
- of diskless workstations, you can see which machines are
- using the same rc.local file by executing the command
- X
- X classify /export/root/*/etc/rc.local
- X
- X(or something like it ) on the server machine.
- X
- X If you want to edit the motd files on these
- workstations, you can use a script like this:
- X
- X foreach i ( `classify -1 /export/root/*/etc/motd` )
- X set ifamily=`classify -m $i /export/root/*/etc/motd`
- X $EDITOR $i
- X foreach j ($ifamily)
- X cp $i $j
- X end
- X end
- X
- which groups the motd files into classes of identical files,
- invokes the editor on one motd from each class, and then
- propagates the changes to the other motds in each class.
- X
- X The `test?' files are sample inputs so you can see what
- X`classify' is doing. Some of the `test' files differ only
- in the case of some of their letters; some have extraneous
- whitespace of various types, some are really the same as
- each other ands some are genuinely different.
- X
- To-Do:
- X
- X `classify' might have better performance if it did
- X`stat' on files it was comparing to see what their i-numbers
- were; if two files are on the same device and have the same
- i-number, then they are necessarily identical, and don't
- need to be compared character-by-character. It might also
- be worthwhile to keep a cache of the first block or so of
- one file from each class, to save repeatedly opening and
- closing files.
- X
- Mark-Jason Dominus
- mjd@saul.cis.upenn.edu
- END_OF_FILE
- if test 1514 -ne `wc -c <'README'`; then
- echo shar: \"'README'\" unpacked with wrong size!
- fi
- # end of 'README'
- fi
- if test -f 'classify.1' -a "${1}" != "-c" ; then
- echo shar: Will not clobber existing file \"'classify.1'\"
- else
- echo shar: Extracting \"'classify.1'\" \(3519 characters\)
- sed "s/^X//" >'classify.1' <<'END_OF_FILE'
- X.TH CLASSIFY 1 "25 Nov 1991"
- X.SH NAME
- classify \- group files that are identical (modulo whitespace)
- X.SH SYNOPSIS
- X.B classify
- X[
- X.B \-s
- X|
- X.B \-l
- X|
- X.B \-1
- X|
- X.B \-m
- X|
- X.B \-M
- X]
- X[
- X.B \-b
- X|
- X.B \-w
- X]
- X[
- X.B \-f
- X]
- X.if n .ti +5
- X[
- X.B \-\|\-
- X|
- X.B \-
- X]
- X.I filename1 filename2
- X[
- X.IR filename3 .\|.\|.
- X]
- X.SH DESCRIPTION
- X.B Classify
- is a program designed to help manage a set of files such as the
- X/etc/rc.local or /etc/motd files for a collection of diskless
- workstations.
- X.B Classify
- examines each of the files named in its arguments, groups
- them into
- X.IR classes ,
- with files that are almost identical in
- the same class, and files that are not very much alike in
- different classes, and outputs a brief report. For example:
- X.PP
- X.B sterno napalm
- X.br
- X.B moe larry curly
- X.br
- X.B holy_grail
- X.PP
- This output indicates that files
- X.BR sterno " and " napalm
- are identical in content, that
- X.BR moe ", " larry ", and " curly
- are all three the same as each other but different from
- X.BR sterno " and " napalm ", "
- and that
- X.B holy_grail
- is different from all the others.
- X.PP
- The other function of
- X.B classify
- is to produce a list of files which are almost the same as a
- single other file.
- X.B Classify
- ignores files which it cannot open for whatever reason,
- continuing on its way.
- X.PP
- X.SH OPTIONS
- X.br
- X.TP
- X.B \-l
- Select long output form. This format is unnecessary, but is still around
- for convenience and hystorical reasons. The `long' form of the example
- output above is:
- X.PP
- X.DS
- Class 1:
- X.br
- X sterno
- X.br
- X napalm
- X.PP
- Class 2:
- X.br
- X moe
- X.br
- X larry
- X.br
- X curly
- X.PP
- Class 3:
- X.br
- X holy_grail
- X.DE
- X.TP
- X.B \-s
- Select short output form: Print the names of the files in
- each class together on a single line. This is the default. See the
- example above.
- X.TP
- X.B \-1
- Select very short output form: Print on the standard
- output the name of only one file
- from each class.
- X.TP
- X.B \-M
- Produce on the standard output a list of all the
- X.IR filename s
- which are identical in content to
- X.IR filename1 .
- X.TP
- X.B \-m
- Like
- X.BR \-M,
- but omit
- X.I filename1
- itself from the output.
- X.TP
- X.B \-b
- Ignore blanks and tabs when comparing the named files.
- X.TP
- X.B \-w
- Ignore blanks, tabs, and newline characters when comparing
- files.
- X.TP
- X.B \-f
- XFold in lower case. Treat upper- and lower- case letters
- equally when comparing files.
- X.TP
- X.B \-
- X.TP
- X.B \-\|\-
- Treat the following arguments as filenames so that you can
- specify filenames starting with a `-' character.
- X.TP
- X.B \-h
- Print summary of correct usage.
- X.LP
- If more than one of
- X.BR \-l ", " \-s ", " \-1 ,
- X.BR \-M ", "
- or
- X.B \-m
- is selected, all but the last one on the command line will
- be ignored.
- X.SH EXAMPLES
- To edit one /etc/motd from each class and then update the
- others.
- X.br
- X.DS L
- X foreach\ i\ (`classify\ -1\ /export/root/*/etc/motd`)
- X.br
- X set\ ifamily=`classify\ \-m\ $i\ /export/root/*/etc/motd`
- X.br
- X vi\ $i
- X.br
- X foreach\ j\ ($ifamily)
- X.br
- X cp\ $i\ $j
- X.br
- X end
- X.br
- X end
- X.DE
- X.SH SEE ALSO
- X.BR cmp (1),
- X.BR diff (1)
- X.SH DIAGNOSTICS
- X.TP 5
- X.BI "Couldn't open file " filename
- Indicates that file
- X.I filename
- does not exist, or that read priviledges are lacking.
- X.TP 5
- X.BI "Unknown option: -" option
- X.SH AUTHOR
- Mark-Jason Dominus, University of Pennsylvania
- X.SH BUGS
- X.B Classify
- should be able to read the standard input as one of the
- files.
- X.PP
- Several performance improvements might be possible.
- X.PP
- X.B Classify
- becomes confused if one of the files it is classifgying is removed
- before it is finished.
- X.PP
- The
- X.B \-l
- option is silly since its function can be duplicated with an
- X.B awk
- script.
- X
- END_OF_FILE
- if test 3519 -ne `wc -c <'classify.1'`; then
- echo shar: \"'classify.1'\" unpacked with wrong size!
- fi
- # end of 'classify.1'
- fi
- if test -f 'classify.c' -a "${1}" != "-c" ; then
- echo shar: Will not clobber existing file \"'classify.c'\"
- else
- echo shar: Extracting \"'classify.c'\" \(17437 characters\)
- sed "s/^X//" >'classify.c' <<'END_OF_FILE'
- X
- X/*
- X * `classify': Sort files into groups by content
- X * Copyright (C) 1991 Mark-Jason Dominus. All rights reserved.
- X *
- X * This program is free software; you can redistribute it and/or modify
- X * it under the terms of the GNU General Public License as published by
- X * the Free Software Foundation; either version 1, or (at your option)
- X * any later version.
- X *
- X * This program is distributed in the hope that it will be useful,
- X * but WITHOUT ANY WARRANTY; without even the implied warranty of
- X * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- X * GNU General Public License for more details.
- X *
- X * You should have received a copy of the GNU General Public License
- X * along with this program; if not, write to the Free Software
- X * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
- X */
- X
- X#include <stdio.h>
- X#include <ctype.h>
- X#include <string.h>
- X
- X /* Return codes from `compare()' and macros for handling them. */
- X#define SAME 1
- X#define DIFFERENT 0
- X#define BADFILE1 4
- X#define BADFILE2 8
- X#define BADFILEBOTH (BADFILE1 | BADFILE2)
- X#define ERROR(RC) ((RC != SAME) && (RC != DIFFERENT))
- X
- X/* Allocate a new object and return a pointer to it. */
- X#define NEW(type) ((type *) malloc(sizeof(type)))
- X
- X/* Are strings a and b equal? ('equal', not 'eq') */
- X#define STREQ(a,b) (strcmp((a),(b)) == 0)
- X
- X/* Flags set by command-line options. */
- int foldflag = 0, blankflag = 0, whiteflag = 0;
- X
- X/* Format option and codes. */
- char formopt = 's';
- X
- X/* Explanation of data structure used in this program:
- X
- X Each 'masternode'is a linked list of filenames. The filenames in each
- X masternode are the names of files that are identical, modulo some
- X whitespace and upper/lower-case distinctions.
- X
- X The masternodes are linked together in a linked list called `list'.
- X
- X Example:
- X
- X list
- X |
- X V data next next
- X masternode------>filenode------->filenode------>NULL
- X | | |
- X | next | data | data
- X | V V
- X | filename1 filename2
- X |
- X V data next
- X masternode------>filenode------>NULL
- X | |
- X | next | data
- X | V
- X V filename3
- X NULL
- X
- X This would represent three files: filename1, filename2, and filename3,
- X of which filename1 and filename2 had the same contents, and filename3
- X was different from both.
- X
- X Note: if j is a pointer to a masternode, then j->data->data is the
- X first filename in j's masternode.
- X */
- X
- typedef struct s_fnode {
- X char *data;
- X struct s_fnode *next;
- X} filenode;
- X
- typedef struct s_mnode {
- X filenode *data;
- X struct s_mnode *next;
- X} masternode;
- X
- main(argc,argv)
- X int argc;
- X char *argv[];
- X{
- X /* Look at these absurd declarations! */
- X int i, compare(), match, numfileargs, parseargs();
- X void usage(), mappend(), fappend();
- X masternode *list, *j, *mnew, *newmasternode();
- X filenode *fnew, *k, *newfilenode();
- X FILE *checkit;
- X /* Didn't anyone ever tell you it wasn't polite to point? */
- X
- X /* Parse the arguments and obliterate switch options like `-f'. */
- X numfileargs = parseargs(argc,argv);
- X /* Anything that survives obliteration is assumed to be a filename. */
- X
- X /* No, no--what is the good of comparing only one file? */
- X if (numfileargs < 2) usage(argv[0]);
- X
- X /* Find the name of the first file on the command line. */
- X for (i=1; argv[i] == NULL; i++) ;
- X
- X /* This program has two essentially separate functions.
- X * One is to take a list of files and group identical ones.
- X * The other is to see which of files 2...n are identical to
- X * file 1.
- X *
- X * If you specify -m or -M, you get the second
- X * functionality. Otherwise, you get the first.
- X *
- X * What follows right here is the second functionality.
- X */
- X
- X if (formopt == 'm' || formopt == 'M') {
- X
- X /* The first file the user named is the one to check the others against.
- X */
- X char *master = argv[i];
- X
- X /* If the user said '-M', echo the name of the master
- X * file; if not, suppress it. */
- X if (formopt == 'M')
- X printf("%s\n",master);
- X
- X for (i += 1; i<argc; i++) {
- X if (argv[i] == NULL) continue;
- X
- X /* If for some reason the name of the master file appears more than
- X once on the command line, suppress it. (This happens all the time
- X in shell scripts.)
- X */
- X if (STREQ(argv[i], master)) continue;
- X
- X match = compare(master, argv[i]);
- X if (match == SAME)
- X printf("%s\n", argv[i]);
- X /* If `match' was an error return, `compare()' printed an error */
- X /* message for us and we need do nothing special. */
- X }
- X
- X exit(0); /* This part of the program can't fail. */
- X }
- X
- X /* If we're here, then the user didn't select -m or -M, and we
- X do the normal thing, which is to look at all the files and sort them
- X into groups.
- X */
- X
- X /* This next bit of code catches a peculiar bug: If it were not here, and
- X if we couldn't open the first file named on the command line, we would
- X put it into the linked list anyway (list->data->data = argv[i]) and
- X subsequent files would get checked against it, yielding many error
- X messages, much wasted time, and erroneous output--there would be a
- X `Class 1' with the bad file alone in it.
- X
- X Putting in this check allows us to make much simpler
- X list-initialization code. I hate writing special-case code for
- X starting off linked lists!
- X */
- X while (((checkit = fopen(argv[i],"r")) == NULL) &&
- X i < argc)
- X fprintf(stderr, "Couldn't open file %s.\n", argv[i++]);
- X fclose(checkit);
- X
- X if (i == argc) exit(0); /* Couldn't read *any* of the input files. */
- X
- X /* Initialize linked lists */
- X list = newmasternode();
- X list->data->data = argv[i];
- X /* Wasn't that simple? Told you so. */
- X
- X for (i += 1; i < argc; i++) { /* Loop through filenames... */
- X if (argv[i] == NULL) continue; /* ... skipping nulls ... */
- X match = DIFFERENT;
- X j=list;
- X do {
- X /* ... matching the current file with the file at the head of each */
- X /* class-list ... */
- X match = compare(argv[i], j->data->data);
- X if (match == DIFFERENT) j = j->next;
- X /* ... until we run out of class lists or find a match or an error. */
- X } while (j && (match == DIFFERENT));
- X
- X /* Now, if there was an error, then... */
- X if (ERROR(match)) {
- X /* ... I hope it was in the current file--that's no problem; we just
- X obliterate it from the list of files to check, and move on, but...
- X */
- X if ((match & BADFILE1) == BADFILE1) {
- X argv[i] = NULL;
- X continue;
- X }
- X /* ... if the problem was with the file in the class list, I am very
- X upset, because it _was_ okay when I put it into the list.
- X (I have violated Steinbach's Guideline for Systems Programming:
- X ``Never test for an error condition you don't know how to
- X handle.'' But actually I could handle this; we could delete the
- X bogus file from the class-list in which it appears. This is a lot
- X of work and it will happen only very rarely and in bizarre
- X circumstances, so I choose not to bother. So sue me.
- X */
- X else if ((match & BADFILE2) == BADFILE2) {
- X fprintf(stderr,"WARNING:\tSomething went wrong with file %s\n",
- X j->data->data);
- X fprintf(stderr,"since the last time I looked at it.\n");
- X /* Yes, Virginia, this is correct behavior. */
- X }
- X }
- X
- X /* Okay, there was no error, but the current file was *not* like
- X any of the ones we've seen so far. Make a new classification and
- X put the current filename into it.
- X */
- X else if (match == DIFFERENT) {
- X mnew = newmasternode();
- X mnew->data->data = argv[i];
- X mappend(list,mnew);
- X }
- X /* Ah, we found a match--the current file is identical to the ones in */
- X /* the classification j->data. */
- X else {
- X#ifdef DEBUG
- X fprintf(stderr, "%s matched %s.\n", argv[i], j->data->data);
- X#endif
- X fnew = newfilenode();
- X fnew->data = argv[i];
- X fappend(j->data, fnew);
- X }
- X } /* for (i += 1; ... ) */
- X
- X /* We are out of the main loop and all the files have been handled,
- X one way or another. Now it is time to spit out the output.
- X */
- X
- X /* `formopt' is '1' if the user selected the `-1' option. It means
- X * that the proram should not do the default thing, which is to make a
- X * nice long report of who matched whom, but rather should just dump out
- X * a list of files each of which represents exactly one of the classes. */
- X if (formopt == '1') {
- X for (j=list; j; j=j->next)
- X printf("%s\n", j->data->data);
- X }
- X /* `formopt' is 's' if the user selected the '-s' option. That
- X * means that the program should make a short, awkable kind of
- X * output, with one line per class, filenames separated by a single
- X * space. Note that we do not number the lines. (I almost had it
- X * number the lines.) The idea is that if the user wanted the lines
- X * numbered, they would pipe the output through 'cat -n'. */
- X else if (formopt == 's') {
- X for (j=list; j; j=j->next) {
- X for (k = j->data; k; k=k->next)
- X printf("%s ", k->data);
- X printf("\n");
- X }
- X }
- X /* Here we make the nice long report. The temptation to add many
- X bells and whistles and have the program accept a format-specification
- X string and so on is very tempting, but I will not give in to foolish
- X creeping featurism. At least, not any more than I already have.
- X Actually, a short-form option, the puts the output in the form
- X 1 foo.c bar.c baz.c la.c
- X 2 la de da oakum yokum
- X 3 cruft FOCUS
- X 4 adventure
- X might be very useful, because as it is you can't really feed this
- X program's output to AWK in a reasonable way.
- X */
- X /* Note added in proof: I gave in to creeping featurism. See the
- X * '-s' option. Sigh. At least I did not make it number the lines. */
- X else {
- X for (j=list, i=1; j; j=j->next, i++) {
- X printf("\nClass %d:\n",i);
- X for (k = j->data; k; k=k->next) {
- X printf("\t%s\n",k->data);
- X }
- X }
- X }
- X
- X exit(0); /* Au 'voir! */
- X}
- X
- X/* This next `compare' routine is what I used to do, but there are good
- X reasons for not using either diff(1) or cmp(1):
- X
- X 1. Do not use diff(1) because it is too intelligent (intelligent ->
- X slow.) Diff tells you where the files differ and that is not what we
- X want--we just want to know if they are different or not.
- X
- X 2. Do not use cmp(1) because we want to use this program for comparing
- X things like /etc/rc.local and /etc/motd which are very likely to differ
- X only in a few whitespaces, and we want this program to report that such
- X files are identical, even though cmp says they're not.
- X
- X Maybe UNIX needs a nice, simple, flexible file-compare utility? Naah,
- X you can always string awk and sed and things onto the front of cmp. But
- X that's too slow for us here.
- X */
- X
- X/* Do not do this:
- X int
- X compare(path1,path2)
- char *path1, *path2;
- X{
- X char compare[1024];
- X
- X sprintf(compare,"cmp -s %s %s",path1,path2);
- X sprintf(compare,"diff -w %s %s > /dev/null 2>&1",path1,path2);
- X return((system(compare) >> 8 == 0) ? SAME : DIFFERENT );
- X}
- X*/
- X
- X/* So this is what we do instead. */
- X
- int
- X compare(path1, path2)
- char *path1, *path2;
- X{
- X FILE *file1, *file2;
- X int c1,c2;
- X
- X if ((file1 = fopen(path1,"r")) == NULL) {
- X fprintf(stderr, "Couldn't open file %s.\n", path1);
- X return(BADFILE1);
- X }
- X if ((file2 = fopen(path2,"r")) == NULL) {
- X fprintf(stderr, "Couldn't open file %s.\n", path2);
- X return(BADFILE2); /* For symmetry, even though this program will become
- X quite irate if `compare' ever returns this code.
- X */
- X }
- X
- X do {
- X do {
- X c1 = getc(file1);
- X /* You may need to make a Karnaugh map to understand this termination
- X condition, but it essentially means to ignore the right white spaces
- X if the right option flags are set, and I have tested it for you,
- X so you may assume it is doing the thing that the man page says it
- X does.
- X */
- X } while (! ((!blankflag && !whiteflag) ||
- X ((c1 != ' ' && c1 != '\t') && (c1 != '\n' || !whiteflag)))
- X ) ;
- X do {
- X c2 = getc(file2);
- X } while (! ((!blankflag && !whiteflag) || /* Ditto */
- X ((c2 != ' ' && c2 != '\t') && (c2 != '\n' || !whiteflag)))
- X ) ;
- X
- X /* Fold case if requested with `-f' flag. */
- X if (foldflag) {
- X c1 = (isupper(c1) ? tolower(c1) : c1);
- X c2 = (isupper(c2) ? tolower(c2) : c2);
- X }
- X
- X if (c1 != c2) {
- X fclose(file1);
- X fclose(file2);
- X return DIFFERENT;
- X }
- X
- X } while (c1 != EOF && c2 != EOF);
- X
- X fclose(file1);
- X fclose(file2);
- X
- X /* If we're here, then both files were identical and we tapped out at */
- X /* least one of them. If we tapped out both, they really are identical. */
- X /* If, on the other hand, only one is finished, then it is a strict */
- X /* prefix of the other and so the two files are *not* the same. */
- X if (c1 == EOF && c2 == EOF)
- X return SAME;
- X else
- X return DIFFERENT;
- X}
- X
- X/* Nyahh nyah! User is a big stupid-head! */
- void
- X usage(progname)
- char *progname;
- X{
- X char *tail;
- X tail = strrchr(progname,'/');
- X
- X if (tail) progname = tail+1;
- X fprintf(stderr,"Usage:\t %s [-1 | -s | -l | -m | -M] [-f] [-b | -w]\n",progname);
- X fprintf(stderr,"\tfile1 file2 [...]\n");
- X fprintf(stderr,"\n\nTry %s -h\t for help.\n", progname);
- X exit(-1);
- X}
- X
- X/* I put this here 'cause I didn't want to write a man page. Duuhhhhh. */
- void
- X help()
- X{
- X fprintf(stderr,"Classify: Examine and group identical files.\n\n");
- X fprintf(stderr,"Flags:\n\t-f\tFold case in file comparisions.\n");
- X fprintf(stderr,"\t-b\tIgnore blanks and TABs in file comparisions.\n");
- X fprintf(stderr,"\t-w\tIgnore all whitespace in file comparisions.\n");
- X fprintf(stderr,"\t-1\tPrint the name of only one file from each class.\n");
- X fprintf(stderr,"\t-l\tPrint long-format output (default).\n");
- X fprintf(stderr,"\t-s\tPrint short-format output.\n");
- X fprintf(stderr,"\t-M\tPrint only names of files that match first file named.\n");
- X fprintf(stderr,"\t-m\tLike -M, but suppress first filename.\n");
- X return;
- X}
- X
- X/* Parse the args and set the flags.
- X We want the argument list to be free-form so you can mix filenames and
- X options. That is because I am a masochist. So to save trouble, we just
- X obliterate the flag arguments by setting them to NULL, and then we have
- X the main routine ignore NULL arguments if it sees any. Programmers who
- X say `but then you can't tell when you've reached the end of the arg list
- X because it is supposed to be a NULL-terminated array!' get a boot to the
- X head.
- X
- X Returns the number of non-flag arguments.
- X */
- X
- int
- X parseargs(argc,argv)
- int argc;
- char *argv[];
- X{
- X int i, j, numnonflags = argc-1;
- X void usage(), help();
- X
- X for (i=1; i<argc; i++) {
- X if (argv[i][0] != '-') continue;
- X numnonflags -= 1;
- X if (argv[i][1] == '\0') { /* If flag is "-", stop parsing args */
- X /* Probably `-' should mean to read the stdin. I will put in
- X that feature three days after next tishabov.
- X
- X (Translation for gentiles: I will put the feature in on the
- X fourth Thursday of next week. )
- X */
- X argv[i] = NULL;
- X return numnonflags;
- X }
- X for (j=1; argv[i][j]; j++) {
- X switch (argv[i][j]) {
- X case '-': /* If flag is "--", stop parsing args */
- X if (j==1) {
- X argv[i] = NULL;
- X return;
- X } /* Else we got a flag like -f-w, so ignore the second "-" sign. */
- X break;
- X case 'f':
- X foldflag = 1;
- X break;
- X case 'b':
- X blankflag = 1;
- X break;
- X case 'w':
- X whiteflag = 1;
- X break;
- X case 'l':
- X formopt = 'l';
- X break;
- X case 's':
- X formopt = 's';
- X break;
- X case '1':
- X formopt = '1';
- X break;
- X case 'h':
- X help(); /* ``Why does this function return?''
- X `` `Cause you're an idiot.''
- X ``Oh yeah. I forgot.''
- X */
- X exit(0);
- X break;
- X case 'm':
- X formopt = 'm';
- X break;
- X case 'M':
- X formopt = 'M';
- X break;
- X default:
- X fprintf(stderr, "Unknown option: -%c.\n", argv[i][j]);
- X usage(argv[0]);
- X }
- X }
- X if (argv[i][0] == '-') argv[i] = NULL; /* Obliterate flag arguments. */
- X }
- X return numnonflags;
- X}
- X
- X/* Manufacture a new masternode whose car is a new filenode. Return a */
- X/* pointer to the new masternode. */
- masternode *
- X newmasternode()
- X{
- X masternode *foo;
- X filenode *newfilenode();
- X
- X foo = NEW(masternode);
- X foo->next = NULL;
- X foo->data = newfilenode();
- X
- X return(foo);
- X}
- X
- X/* Manufacture a new filenode whose car is the null string. Return a */
- X/* pointer to the new filenode. */
- filenode *
- X newfilenode()
- X{
- X filenode *foo;
- X
- X foo = NEW(filenode);
- X foo->next = NULL;
- X foo->data = NULL;
- X
- X return(foo);
- X}
- X
- X/* head and tail are pointers to masternodes. (i.e., they are linked lists */
- X/* of masternodes.) Append tail to the end of head. (LISP pepole would */
- X/* call this operation `nconc'. I can't say the word `nconc' without */
- X/* bursting out laughing, so I called it `mappend' instead.) */
- void
- X mappend(head,tail)
- masternode *head, *tail;
- X{
- X masternode *i;
- X
- X /* Find the end of the linked list `head' */
- X for (i=head; i->next; i = i->next) ;
- X
- X /* Concatenate. */
- X i->next = tail;
- X
- X return;
- X}
- X
- X/* This is the same as mappend, except it works on filenode-lists instead */
- X/* of masternode-lists. Big deal. */
- void
- X fappend(head,tail)
- filenode *head, *tail;
- X{
- X filenode *i;
- X
- X for (i=head; i->next; i = i->next) ;
- X
- X /* nconc! nconc! nconc! hahahaha! */
- X i->next = tail;
- X
- X return;
- X}
- X
- X
- END_OF_FILE
- if test 17437 -ne `wc -c <'classify.c'`; then
- echo shar: \"'classify.c'\" unpacked with wrong size!
- fi
- # end of 'classify.c'
- fi
- if test -f 'test0' -a "${1}" != "-c" ; then
- echo shar: Will not clobber existing file \"'test0'\"
- else
- echo shar: Extracting \"'test0'\" \(28 characters\)
- sed "s/^X//" >'test0' <<'END_OF_FILE'
- this is the forest primeval
- END_OF_FILE
- if test 28 -ne `wc -c <'test0'`; then
- echo shar: \"'test0'\" unpacked with wrong size!
- fi
- # end of 'test0'
- fi
- if test -f 'test1' -a "${1}" != "-c" ; then
- echo shar: Will not clobber existing file \"'test1'\"
- else
- echo shar: Extracting \"'test1'\" \(28 characters\)
- sed "s/^X//" >'test1' <<'END_OF_FILE'
- this is the forest primeval
- END_OF_FILE
- if test 28 -ne `wc -c <'test1'`; then
- echo shar: \"'test1'\" unpacked with wrong size!
- fi
- # end of 'test1'
- fi
- if test -f 'test2' -a "${1}" != "-c" ; then
- echo shar: Will not clobber existing file \"'test2'\"
- else
- echo shar: Extracting \"'test2'\" \(31 characters\)
- sed "s/^X//" >'test2' <<'END_OF_FILE'
- this is the forest primeval
- END_OF_FILE
- if test 31 -ne `wc -c <'test2'`; then
- echo shar: \"'test2'\" unpacked with wrong size!
- fi
- # end of 'test2'
- fi
- if test -f 'test3' -a "${1}" != "-c" ; then
- echo shar: Will not clobber existing file \"'test3'\"
- else
- echo shar: Extracting \"'test3'\" \(36 characters\)
- sed "s/^X//" >'test3' <<'END_OF_FILE'
- X
- this is the forest primeval
- X
- X
- END_OF_FILE
- if test 36 -ne `wc -c <'test3'`; then
- echo shar: \"'test3'\" unpacked with wrong size!
- fi
- # end of 'test3'
- fi
- if test -f 'test4' -a "${1}" != "-c" ; then
- echo shar: Will not clobber existing file \"'test4'\"
- else
- echo shar: Extracting \"'test4'\" \(28 characters\)
- sed "s/^X//" >'test4' <<'END_OF_FILE'
- THIS is the forest primeval
- END_OF_FILE
- if test 28 -ne `wc -c <'test4'`; then
- echo shar: \"'test4'\" unpacked with wrong size!
- fi
- # end of 'test4'
- fi
- echo shar: End of shell archive.
- exit 0
-